机器学习纳米学位


毕业项目:猫狗大战App实现 何伟华 udacity

2019.07.29

I. 问题的定义


项目概述

Cats vs. Dogs来源于Kaggle大数据竞赛的一道赛题(娱乐型竞赛项目):https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition, 利用给定的数据集,用算法实现猫和狗的识别。

我们的目的需要用训练集对模型进行训练,然后在测试集上“考试”,提交kaggle取得高分查看考试结果。本项目使用卷积神经网络识别一张图片是猫还是狗,这是一个二分类问题。1表示分类结果是狗,0表示分类结果是猫。

项目背景:kaggle一共举行过两次猫狗大战的比赛,第一次是在2013年,那个时候使用的是正确率作为评估标准,而在2017年第二次举办猫狗大战的比赛时,使用的是log损失函数。这么做是因为现在深度学习的发展到一定阶段有更好的方法,而深度学习尤其适合处理图像方面的问题,如果依旧是使用正确率作为评估标准,那么大多数选手的模型都是99%的正确率,不能明显地区分开。如果使用log损失函数,不仅仅需要分类正确,还需要对结果有一个较高的可信度,这样就能明显地区分各个模型的分类效果,尤其是Top模型的分类效果。 因此参赛者需要训练一个机器学习模型,输入测试集中的图片,输出一个概率,概率越接近1,表示该图片分类结果是狗的概率越高;概率越接近0,表示该图片分类结果是猫的概率越高。

卷积神经网络(Convolutional Neural Network, CNN)是深度学习技术中极具代表的网络结构之一,在图像处理领域取得了很大的成功,在国际标准的 ImageNet 数据集上,许多成功的模型都是基于 CNN 的。CNN 网络对图片进行多次卷积层和池化层处理,在输出层给出两个节点并进行 softmax 计算得到两个类别各自的概率。

本项目数据集是Kaggle竞赛提供的数据,训练集包括 12500 张被标记为猫的图片和12500张被标记为狗的图片,测试集包括12500张未标记图片。对于每一张测试集中的图像,模型需要预测出是狗图像的概率(1 代表狗,0 代表猫)。

项目最终需要训练基于 CNN 的机器学习模型,对测试样本进行分类,并将最终结果上传 kaggle 进行最终评判。同时也实现了使用 Keras 和 Xcode 实现猫狗大战App,可以通过摄像头或者照片点选的一张猫或者狗的图片预测是猫或者狗的概率。

  • 输入:一张彩色图片
  • 输出:是猫还是狗的概率

实验环境


本项目使用Anaconda搭建环境。mac os、jupyter notebook、python3、keras、xcode等,模型训练使用云服务。

问题陈述


猫狗大战是Kaggle娱乐型竞赛项目,我们的目的需要用训练集对模型进行训练,然后在测试集上“考试”,提交kaggle取得高分查看考试结果,现在是2019年,这个是2年前的比赛,是不能查看自己排名。本项目使用卷积神经网络识别一张图片是猫还是狗,这是一个二分类问题。给定一张图片,算法需要预测出图片属于预先定义类别中的哪一类。在计算机视觉领域,目前解决这类问题是深度学习(Deep Learning),特别针对图像类型的数据,是深度学习中的卷积神经网络CNN架构,针对图像识别特别棒。

数据集中大部分图片是正常的,有少部分异常图片和低分辨率图片,对于训练集来说这些异常数据是要剔除掉的。 数据集中的文件名是以 type.num.jpg 方式命名的,比如 cat.0.jpg。使用 Keras 的 ImageDataGenerator 需要将不同种类的图片分在不同的文件夹中。 数据集中的图像大小是不固定的,但是神经网络输入节点的个数是固定的。所以在将图像的像素作为输入之前,需要将图像的大小进行 resize。要想取得好的名次,需要选择合理的分类器。

评价指标


对数损失(Log loss)亦被称为逻辑回归损失(Logistic regression loss)或交叉熵损失(Cross-entropy loss)。 交叉熵是常用的评价方式之一,它实际上刻画的是两个概率分布之间的距离,是分类问题中使用广泛的一种损失函数。 本文实际上是二分类问题, 因此可以采用 logloss 损失函数作为评价指标, 计算公式如下: $$\textrm{LogLoss} = - \frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)\right]$$ 其中:

  • n 是测试集中图片数量
  • $\hat{y}_i$ 是图片预测为狗的概率
  • $y_i$ 如果图像是狗为1,如果是猫为0
  • log() 是自然(基数 e)对数

采用交叉熵作为损失函数可以有效的解决梯度消失和梯度爆炸的问题。 交叉熵损失越小,代表模型的性能越好。上述评估指标可用于评估该项目的解决方案以及基准模型。

II. 分析


数据探索及预处理

下载 kaggle 猫狗数据集解压后分为 3 个文件 train.zip、 test.zip 和 sample_submission.csv。 数据集由训练数据和测试数据组成,训练数据包含猫和狗各12500张图片,测试数据包含12500张猫和狗的图片。命名规则根据 “type.num.jpg” 方式命名。 test 测试集包含了 12500 张猫狗的图片,没有标定是猫还是狗,每张图片命名规则根据 “num.jpg”,需要注意的是测试集编号从 1 开始,而训练集的编号从 0 开始。 sample_submission.csv 需要将最终测试集的测试结果写入submission.csv 文件中,上传至 kaggle 进行打分。 具体步骤: 从Dogs vs. Cats Redux: Kernels Edition 下载训练数据到image目录 并解压到当前目录。 从训练集中提取图片可视化显示。

In [1]:
import os
os.chdir("{}/image".format(os.getcwd()))
In [2]:
%ls train | head
cat.0.jpg
cat.10000.jpg
cat.10001.jpg
cat.10002.jpg
cat.10003.jpg
cat.10004.jpg
cat.10005.jpg
cat.10006.jpg
cat.10007.jpg
cat.10008.jpg
ls: write error
In [3]:
%ls test | head
10000.jpg
10001.jpg
10002.jpg
10003.jpg
10004.jpg
10005.jpg
10006.jpg
10007.jpg
10008.jpg
10009.jpg
ls: write error
In [4]:
from keras.preprocessing.image import load_img
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  
plt.style.use('seaborn-white')

namelist = os.listdir('train/')
plt.figure(figsize=(12, 12))
for i in range(0, 16):
    plt.subplot(4, 4, i+1)
    j = random.randint(0, 25000)
    img = load_img('train/'+ namelist[j])
    plt.title(namelist[j])
    plt.axis('off')
    plt.imshow(img, interpolation='nearest')
Using TensorFlow backend.

绘制训练集中图片的尺寸散点分布图。

In [5]:
from keras.preprocessing.image import img_to_array, load_img
import os
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')

targetnames = os.listdir('train/')
heights = []
widths = []
for name in targetnames[1:]:
    img = load_img('train/' + name)
    x = img_to_array(img)
    heights.append(x.shape[0])
    widths.append(x.shape[1])

x = np.array(widths)
y = np.array(heights)
area = np.pi * (35 * 0.05)**2  # dot's size

plt.scatter(x, y, s=area, c='red', alpha=1, marker = 'o')
plt.title('scatter diagram of picture size in train dataset')
plt.xlabel('width')
plt.ylabel('height')
plt.show()

获取训练集中高度或者宽度小于50的图片。

In [2]:
from keras.preprocessing.image import img_to_array, load_img
import os
import numpy as np
import matplotlib.pyplot as plt

targetnames = os.listdir('train/')
bad_picture = []
for name in targetnames[1:]:
    img = load_img('train/'+name)
    x = img_to_array(img)
    if x.shape[0] < 50 or x.shape[1] < 50:
        bad_picture.append(name) 
print(bad_picture)
['dog.4367.jpg', 'dog.11248.jpg', 'dog.9246.jpg', 'dog.10747.jpg', 'cat.10392.jpg', 'dog.10733.jpg', 'cat.2433.jpg', 'cat.9171.jpg', 'dog.1324.jpg', 'dog.7011.jpg', 'dog.11686.jpg', 'cat.6402.jpg', 'cat.6614.jpg', 'cat.5527.jpg', 'cat.4821.jpg', 'cat.5534.jpg', 'dog.9705.jpg', 'dog.2652.jpg']
In [3]:
import matplotlib.pyplot as plt
from keras.preprocessing.image import load_img
# from keras.preprocessing import image
from math import ceil
plt.style.use('seaborn-white')
        
def show_img_list(img_list, size = (12, 12)):
    print("Len img_list: {}".format(len(img_list)))
    plt.figure(figsize=size)
    subplot_row = ceil(len(img_list) / 5)
    for i in range(0, len(img_list)):
        plt.subplot(subplot_row, 5, i+1)
        img = load_img('train/'+ img_list[i])
        plt.title(img_list[i])
        plt.axis('off')
        plt.imshow(img)
        
show_img_list(bad_picture)
Len img_list: 18

绘制测试集中图片的尺寸散点分布图

In [8]:
from keras.preprocessing.image import img_to_array, load_img
import os
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')

targetnames = os.listdir('test/')
heights = []
widths = []
for name in targetnames[1:]:
    if(name.endswith('.jpg')):
        img = load_img('test/'+name)
        x = img_to_array(img)
        heights.append(x.shape[0])
        widths.append(x.shape[1])

x = np.array(widths)
y = np.array(heights)
area = np.pi * (35 * 0.05)**2  # dot's size

plt.scatter(x, y, s=area, c='r', alpha=1, marker = 'o')
plt.title('scatter diagram of picture size in test dataset')
plt.xlabel('width')
plt.ylabel('height')
plt.show()

通过对图片中的色彩-像素比进行 IQR 分析,可以发现很多分辨率低、无关的图片,我们需要把不合格的图片删除

In [6]:
from PIL import Image
import os
import numpy as np
import shutil
from collections import Counter

targetnames = os.listdir('train/')
ratio_list = []

for name in targetnames[1:]:
    im = Image.open('train/' + name)
    x = im.histogram(mask=None)
    count = Counter(x)    
    ratio_list.append(float(len(count))/len(x))
In [7]:
import numpy as np

q99, q01 = np.percentile(ratio_list, [99, 1])

print(q99, q01)
0.8020833333333334 0.20052083333333334
In [8]:
from keras.preprocessing.image import img_to_array, load_img
import shutil
import os
import matplotlib.pyplot as plt
from math import ceil
%matplotlib inline

plt.style.use('seaborn-white')

outlier_picture = []
targetnames = os.listdir('train/')
for name in targetnames[:]:
    im = Image.open('train/' + name)
    x = im.histogram(mask=None)
    count = Counter(x)
    if float(len(count))/len(x) < q01:
        outlier_picture.append(name)
#         img = load_img('train/'+name)     
#         plt.title(name)
#         plt.imshow(img)
#         plt.show()
        
print(outlier_picture)
['dog.7322.jpg', 'dog.8987.jpg', 'dog.3805.jpg', 'cat.2095.jpg', 'dog.4367.jpg', 'dog.2892.jpg', 'dog.11248.jpg', 'cat.3980.jpg', 'cat.5671.jpg', 'dog.4777.jpg', 'cat.6386.jpg', 'dog.1197.jpg', 'dog.9246.jpg', 'cat.3567.jpg', 'cat.11879.jpg', 'dog.1381.jpg', 'cat.4963.jpg', 'dog.8428.jpg', 'dog.2139.jpg', 'dog.12185.jpg', 'dog.10989.jpg', 'dog.9130.jpg', 'cat.8744.jpg', 'cat.8585.jpg', 'dog.990.jpg', 'dog.3147.jpg', 'dog.9456.jpg', 'cat.10807.jpg', 'dog.12178.jpg', 'dog.1546.jpg', 'dog.11465.jpg', 'dog.10747.jpg', 'dog.10155.jpg', 'cat.7703.jpg', 'dog.9536.jpg', 'cat.2691.jpg', 'dog.2476.jpg', 'cat.3216.jpg', 'cat.575.jpg', 'cat.10392.jpg', 'dog.561.jpg', 'cat.3410.jpg', 'cat.2939.jpg', 'cat.5529.jpg', 'dog.5645.jpg', 'dog.12223.jpg', 'dog.10385.jpg', 'dog.4980.jpg', 'dog.10637.jpg', 'dog.7127.jpg', 'cat.7317.jpg', 'dog.10225.jpg', 'cat.11504.jpg', 'dog.943.jpg', 'dog.8570.jpg', 'dog.10190.jpg', 'dog.1353.jpg', 'dog.407.jpg', 'cat.7314.jpg', 'cat.8594.jpg', 'cat.11263.jpg', 'dog.7369.jpg', 'cat.2753.jpg', 'dog.10733.jpg', 'dog.5797.jpg', 'dog.6473.jpg', 'dog.1308.jpg', 'dog.6301.jpg', 'dog.69.jpg', 'cat.8848.jpg', 'dog.9999.jpg', 'dog.3536.jpg', 'cat.3699.jpg', 'cat.273.jpg', 'cat.5954.jpg', 'dog.12331.jpg', 'cat.2977.jpg', 'dog.6504.jpg', 'cat.44.jpg', 'dog.4924.jpg', 'dog.81.jpg', 'cat.7588.jpg', 'cat.2433.jpg', 'cat.3739.jpg', 'cat.8724.jpg', 'dog.927.jpg', 'cat.9837.jpg', 'dog.7421.jpg', 'dog.6059.jpg', 'dog.5746.jpg', 'cat.9589.jpg', 'cat.8456.jpg', 'cat.664.jpg', 'dog.6845.jpg', 'dog.881.jpg', 'cat.5403.jpg', 'dog.3524.jpg', 'cat.11161.jpg', 'dog.10654.jpg', 'dog.6299.jpg', 'dog.7378.jpg', 'cat.9171.jpg', 'dog.2566.jpg', 'cat.3886.jpg', 'cat.3845.jpg', 'cat.11942.jpg', 'dog.4507.jpg', 'cat.11177.jpg', 'cat.3716.jpg', 'cat.1840.jpg', 'dog.6112.jpg', 'dog.5427.jpg', 'cat.7968.jpg', 'dog.2188.jpg', 'dog.1324.jpg', 'dog.3255.jpg', 'dog.5618.jpg', 'cat.8087.jpg', 'dog.3335.jpg', 'cat.8044.jpg', 'cat.8534.jpg', 'dog.12322.jpg', 'dog.7374.jpg', 'cat.10107.jpg', 'dog.284.jpg', 'cat.11184.jpg', 'dog.6650.jpg', 'dog.1895.jpg', 'cat.8470.jpg', 'dog.11747.jpg', 'cat.4921.jpg', 'cat.1859.jpg', 'dog.7765.jpg', 'cat.9578.jpg', 'dog.11237.jpg', 'cat.8935.jpg', 'cat.8504.jpg', 'dog.8736.jpg', 'dog.6733.jpg', 'cat.9624.jpg', 'dog.1935.jpg', 'dog.4468.jpg', 'cat.4670.jpg', 'dog.5602.jpg', 'dog.2390.jpg', 'cat.10925.jpg', 'dog.6685.jpg', 'dog.10664.jpg', 'dog.7772.jpg', 'dog.2965.jpg', 'dog.1920.jpg', 'cat.10893.jpg', 'dog.10274.jpg', 'dog.11142.jpg', 'cat.9635.jpg', 'cat.7630.jpg', 'cat.9609.jpg', 'cat.10854.jpg', 'dog.5015.jpg', 'dog.296.jpg', 'cat.10277.jpg', 'dog.3074.jpg', 'cat.48.jpg', 'cat.9967.jpg', 'cat.11342.jpg', 'cat.9595.jpg', 'dog.7011.jpg', 'dog.8450.jpg', 'cat.485.jpg', 'cat.3641.jpg', 'dog.10729.jpg', 'dog.4336.jpg', 'dog.11184.jpg', 'cat.5754.jpg', 'dog.3115.jpg', 'dog.4134.jpg', 'dog.11609.jpg', 'cat.6263.jpg', 'cat.4306.jpg', 'cat.1726.jpg', 'cat.8138.jpg', 'dog.5604.jpg', 'cat.2165.jpg', 'cat.8448.jpg', 'dog.3088.jpg', 'cat.7034.jpg', 'cat.6699.jpg', 'dog.9188.jpg', 'dog.531.jpg', 'dog.1259.jpg', 'cat.5780.jpg', 'dog.12303.jpg', 'dog.1028.jpg', 'dog.7459.jpg', 'cat.4360.jpg', 'dog.11686.jpg', 'dog.3429.jpg', 'cat.4994.jpg', 'dog.1174.jpg', 'cat.146.jpg', 'cat.6402.jpg', 'cat.1423.jpg', 'cat.6614.jpg', 'cat.11045.jpg', 'cat.5527.jpg', 'cat.4363.jpg', 'dog.11849.jpg', 'cat.596.jpg', 'cat.11331.jpg', 'dog.9517.jpg', 'dog.4972.jpg', 'dog.8152.jpg', 'cat.11091.jpg', 'cat.4821.jpg', 'dog.10001.jpg', 'dog.6755.jpg', 'dog.11119.jpg', 'dog.1012.jpg', 'cat.5534.jpg', 'dog.5670.jpg', 'dog.7926.jpg', 'dog.9512.jpg', 'cat.2663.jpg', 'dog.11.jpg', 'cat.183.jpg', 'cat.11485.jpg', 'dog.182.jpg', 'cat.2845.jpg', 'cat.11484.jpg', 'dog.11252.jpg', 'dog.9705.jpg', 'cat.7487.jpg', 'cat.11094.jpg', 'dog.4427.jpg', 'cat.10175.jpg', 'cat.8749.jpg', 'cat.4629.jpg', 'dog.2068.jpg', 'dog.2652.jpg', 'cat.1631.jpg', 'dog.12.jpg', 'cat.2674.jpg', 'cat.4577.jpg', 'dog.9288.jpg', 'dog.630.jpg', 'cat.4833.jpg', 'dog.11457.jpg', 'dog.7893.jpg', 'cat.10809.jpg']

经过观察数据,数据集中大部分图片是正常的,下面这小部分异常图片,对于训练集来说这些异常数据是要剔除掉的。

In [4]:
from keras.preprocessing.image import img_to_array, load_img
import shutil
import matplotlib.pyplot as plt
import os

# pick out the outliers & low resolution pictures
pick_out_outlier_list1 = ['cat.12272.jpg', 'cat.3868.jpg', 'dog.10190.jpg', 'dog.1773.jpg', 'dog.4507.jpg', 'dog.2422.jpg', 
                         'dog.10123.jpg', 'dog.6475.jpg', 'cat.10700.jpg', 'dog.8898.jpg', 'dog.3889.jpg','cat.3216.jpg',
                         'cat.12476.jpg', 'dog.12376.jpg', 'cat.10712.jpg', 'dog.11299.jpg', 'cat.2433.jpg', 'cat.8456.jpg', 
                         'cat.6345.jpg', 'dog.5604.jpg', 'dog.10747.jpg', 'dog.9188.jpg', 'dog.10161.jpg', 'dog.8736.jpg', 
                         'cat.9171.jpg', 'dog.1194.jpg', 'cat.7564.jpg', 'cat.10365.jpg', 'dog.12155.jpg', 'cat.10029.jpg', 
                         'dog.6405.jpg', 'dog.9517.jpg', 'dog.7076.jpg', 'cat.4688.jpg', 'dog.1259.jpg', 'dog.2614.jpg', 
                         'dog.9931.jpg', 'dog.1043.jpg', 'cat.11607.jpg', 'cat.5351.jpg', 'dog.10801.jpg', 'cat.8921.jpg',
                         'dog.9561.jpg', 'cat.7377.jpg', 'cat.5418.jpg', 'cat.7968.jpg', 'cat.4338.jpg', 'cat.7899.jpg', 
                         'cat.11184.jpg', 'cat.3672.jpg', 'dog.4367.jpg', 'dog.1895.jpg', 'dog.4218.jpg','dog.10237.jpg','dog.9705.jpg']
                         
pick_out_outlier_list2 = ['cat.343.jpg', 'dog.4872.jpg', 'cat.4741.jpg', 'cat.10636.jpg', 'dog.8421.jpg', 'cat.10620.jpg', 
                         'dog.10733.jpg', 'dog.7169.jpg', 'cat.1797.jpg', 'dog.6725.jpg', 'dog.2339.jpg', 'cat.2520.jpg', 
                         'cat.2337.jpg', 'cat.2835.jpg', 'cat.1616.jpg', 'cat.5806.jpg', 'dog.7805.jpg', 'dog.1182.jpg', 
                         'cat.7919.jpg', 'cat.3123.jpg', 'cat.7730.jpg', 'dog.3341.jpg', 'cat.5485.jpg', 'dog.3429.jpg',
                         'cat.8044.jpg', 'cat.7487.jpg', 'cat.8138.jpg', 'dog.5618.jpg', 'dog.12331.jpg', 'dog.11686.jpg', 
                         'cat.5527.jpg', 'cat.7703.jpg', 'cat.6402.jpg', 'cat.6263.jpg', 'cat.44.jpg', 'cat.6699.jpg', 
                         'cat.9624.jpg', 'dog.3074.jpg', 'cat.8087.jpg', 'dog.7772.jpg', 'dog.10225.jpg', 'cat.5954.jpg', 
                         'dog.11248.jpg', 'cat.2753.jpg', 'cat.11091.jpg', 'dog.9246.jpg', 'dog.10001.jpg', 'dog.10155.jpg',
                         'dog.6685.jpg', 'cat.9589.jpg', 'dog.11465.jpg', 'cat.4821.jpg']

# show outlier images
show_img_list(pick_out_outlier_list1, (12, 20))
show_img_list(pick_out_outlier_list2, (12, 20))
Len img_list: 55
Len img_list: 52
In [14]:
# remove these pictures out of the train/ directory
def rmrf_mkdir(dirname):
    if os.path.exists(dirname):
        shutil.rmtree(dirname)
    os.mkdir(dirname)


def rm_outlier_imgs(outlier_list):
    for i in range(0, len(outlier_list)):
        img_name = outlier_list[i]
        shutil.move('train/' + img_name, 'outlier/' + img_name)
        
rmrf_mkdir('outlier')  
rm_outlier_imgs(pick_out_outlier_list1)
rm_outlier_imgs(pick_out_outlier_list2)

此外,通过对图片数据的探索,我们可以知道,图片中猫狗的拍摄角度不尽相同,而且猫狗占整张图片的比例也有所差别。为了让模型尽量不受这些因素的干扰,增强模型的泛化能力,需要对原始图片进行一些随机操作,比如旋转、剪切变换、缩放、水平翻转等。

数据集中的文件名是以type.num.jpg方式命名的,比如cat.0.jpg。使用 Keras 的 ImageDataGenerator 需要将不同种类的图片分在不同的文件夹中。对数据集进行预处理参考的是杨培文的Blog创建符号链接(symbol link)的方法,这样的好处是不用复制一遍图片,占用不必要的空间。

In [12]:
import os
import shutil

train_filenames = os.listdir('train')
train_cat = filter(lambda x:x[:3] == 'cat', train_filenames)
train_dog = filter(lambda x:x[:3] == 'dog', train_filenames)

def rmrf_mkdir(dirname):
    if os.path.exists(dirname):
        shutil.rmtree(dirname)
    os.mkdir(dirname)

rmrf_mkdir('img_train')
os.mkdir('img_train/cat')
os.mkdir('img_train/dog')

rmrf_mkdir('img_test')
os.symlink('../test/', 'img_test/test')

for filename in train_cat:
    os.symlink('../../train/'+filename, 'img_train/cat/'+filename)

for filename in train_dog:
    os.symlink('../../train/'+filename, 'img_train/dog/'+filename)

def rmrf_mkdir(dirname):
    if os.path.exists(dirname):
        shutil.rmtree(dirname)
    os.mkdir(dirname)

def rmrf_mkdir(dirname):
    if os.path.exists(dirname):
        shutil.rmtree(dirname)
    os.mkdir(dirname)

图像文件分类后的路径如下:

image
├── test 
├── img_test
   ├── test 
├── train 
├── img_train
   ├── cat 
   └── dog

可视化数据集:

In [13]:
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

plt.style.use('ggplot')

x = ['train_cat', 'train_dog', 'test']
y = [len(os.listdir('img_train/cat')), len(os.listdir('img_train/dog')), len(os.listdir('img_test/test'))]
ax = sns.barplot(x=x, y=y)
In [14]:
s_count = """image数据集中,猫的数量:{},狗的数量:{},测试集图片数量:{}""".format(len(os.listdir('img_train/cat')), len(os.listdir('img_train/dog')),len(os.listdir('img_test/test')))
s_count
Out[14]:
'image数据集中,猫的数量:12446,狗的数量:12447,测试集图片数量:12500'

问题分析及模型选择


在给定一张图片活视频流的一帧图片,模型预测出图像属于预先定义类别中的哪一类。在计算机视觉领域,目前解决这类问题的核心技术框架是深度学习(Deep Learning),特别针对图像类型的数据,是深度学习中的卷积神经网络(Convolutional Neural Networks, ConvNets)架构。常见的卷积神经网络架构如下:

总的来说,卷积神经网络是一种特殊的神经网络结构,即通过卷积操作可以实现对图像特征的自动学习,选取那些有用的视觉特征以最大化图像分类的准确率。

上图给出了一个简单的猫狗识别的卷积神经网络结构,在最底下(同时也是最大的)的点块表示的是网络的输入层(Input Layer),通常这一层作用是读入图像作为网络的数据输入。在最上面的点块是网络的输出层(Output Layer),其作用是预测并输出读入图像的类别,在这里由于只需要区分猫和狗,因此输出层只有2个神经计算单元。而位于输入和输出层的,都称之为隐含层(Hidden Layer),图中有3个隐含层,图像分类的隐含层都是由卷积操作完成的,因此这样的隐含层也成为卷积层(Convolutional Layer)。因此,输入层、卷积层、输出层的结构及其对应的参数就构成了一个典型的卷积神经网络。当然,在实际中使用的卷积神经网络要比这个示例的结构更加复杂,自2012年的ImageNet比赛起,几乎每一年都会有新的网络结构诞生,已经被大家认可的常见网络有AlexNet, VGG-Net, GoogLeNet, Inception V2-V4, ResNet等等。这些卷积神经网络都是在ImageNet数据集上表现非常优异的神经网络,具体准确率和模型大小如下图所示。

卷积神经网络中卷积层和池化层主要是对图片的几何特征进行抽取,比如浅层的卷积池化层可以抽取出一些直线,角点等简单的抽象信息,深层的卷积池化层可以抽取人脸等复杂的抽象信息,最后的全连接层才是对图片分类的关键处理。因此可以利用已经训练好的卷积神经网络提取图片中复杂的几何特征,即将原始图片用已经训练好的卷积神经网络处理之后的输出,作为新的输入,然后加上自己的全连接层,去进行分类。在模型训练的过程中,只改变新加的全连接层的权重。

由于每一种神经网络提取的特征都不一样,因此本项目将多个神经网络处理的结果拼接,作为最后一层全连接层的输入,这样做可以有效地降低方差。

本项目迁移学习部分使用Keras实现,而Keras中可以导入的模型有Xception,VGG16,VGG19,ResNet50,InceptionV3,InceptionResNetV2,MobileNet. 综合考虑模型的分类准确率和大小,选用迁移学习的基础模型为ResNet50,InceptionV3,Xception,InceptionResNetV2。

III. 方法实现


生成迁移学习特征向量

在keras文档中的预处理函数,根据在imagenet数据集上的预测准确率,排名比较好的的是InceptionResNetV2、Xception、InceptionV3,ResNet50考虑用这四个模型进行融合。这四个模型对于输入数据都有各自的默认值,比如在输入图片大小维度上,Xception、InceptionV3、InceptionResNetV2默认输入图片大小是299x299,ResNet50默认输入图片大小是224x224;在输入数值维度上,Xception、InceptionV3、InceptionResNetV2默认输入数值在(-1, 1)范围内。当要输入与默认图片大小不同的图片时,只需传入当前图片大小即可。ResNet50需要对图片进行中心化处理,由于载入的ResNet50模型是在ImageNet数据上训练出来的,所以在预处理中每个像素点都要减去ImageNet均值。当要输入与默认图片大小不同的图片时,只需传入当前图片大小即可。当输入数值不符合默认要求时,使用每个模型的预处理函数preprocess_input即可将输入图片处理成该模型的标准输入。

常见的卷积神经网络结构在前面的若干层都是卷积池化层及其各种变种,后面几层都是全连接层,这些全连接层之前的网络层被称为瓶颈层 (bottleneck). 将新的图片通过训练好的卷积神经网络直到瓶颈层的过程可以看做是对图像进行特征提取的过程。一般情况下,为了减少内存的消耗, 加快计算的过程,再将瓶颈层的结果输入全连接层之前,做一次全局平均池化,比如ResNet50瓶颈层输出结果是7x7x2048,如果直接输入到全连接层,参数会非常多,所以进行一次全局平均池化,将输出矩阵调整为 1x1x2048,这么做还有一个好处,那就是可以降低过拟合的程度。

在Keras中载入模型并进行全局平均池化,只需要在载入模型的时候,设置include_top=False, pooling='avg'. 每个模型都将图片处理成一个 1x2048的行向量,将这四个行向量进行拼接,得到一个1x8192的行向量, 作为数据预处理的结果。

In [15]:
from keras.models import *
from keras.layers import *
from keras.applications import *
from keras.applications.inception_resnet_v2 import InceptionResNetV2,preprocess_input
from keras.preprocessing.image import *
import h5py

train_sample_counts = len(os.listdir('img_train'))
test_sample_counts = len(os.listdir('img_test'))

def preprocess_input(x):
    return x - [103.939, 116.779, 123.68]

def write_gap(MODEL, image_size, func=None):
    width = image_size[0]
    height = image_size[1]
    inputs = Input((height, width, 3))
    x = inputs
    if func:
        if(MODEL.__name__ == 'ResNet50'):
            x = Lambda(preprocess_input, name='preprocessing')(x)
        else:
             x = Lambda(func)(x)#增加预处理函数层
    
    base_model = MODEL(input_tensor=x, weights='imagenet', include_top=False)
    model = Model(base_model.input, GlobalAveragePooling2D()(base_model.output))
    
    gen = ImageDataGenerator()
    train_generator = gen.flow_from_directory("img_train", image_size, shuffle=False, batch_size=16)
    test_generator = gen.flow_from_directory("img_test", image_size, shuffle=False, batch_size=16, class_mode=None)
    #train = model.predict_generator(train_generator, train_generator.nb_sample) 
    #test = model.predict_generator(test_generator, test_generator.nb_sample)
    train = model.predict_generator(train_generator, train_sample_counts) 
    test = model.predict_generator(test_generator, test_sample_counts)
    

    with h5py.File("gap_%s.h5"%MODEL.__name__) as h:
        h.create_dataset("train", data=train)
        h.create_dataset("test", data=test)
        h.create_dataset("label", data=train_generator.classes)
In [16]:
write_gap(ResNet50, (224, 224), preprocess_input)
WARNING:tensorflow:From /usr/local/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:1344: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Found 24893 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.
In [17]:
write_gap(InceptionV3, (299, 299), inception_v3.preprocess_input)
Found 24893 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.
In [18]:
write_gap(Xception, (299, 299), xception.preprocess_input)
Found 24893 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.
In [19]:
write_gap(InceptionResNetV2, (299, 299), inception_resnet_v2.preprocess_input)
Found 24893 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.

载入特征向量


经过上面的代码以后,我们获得了三个特征向量文件,分别是:

  • gap_ResNet50.h5
  • gap_InceptionV3.h5
  • gap_Xception.h5
  • gap_InceptionResNetV2.h5

这里需要载入这些特征向量,并且将它们合成一条特征向量,然后记得把 X 和 y 打乱,不然之后设置validation_split的时候会出问题。这里设置了 numpy 的随机数种子为2019。

In [40]:
import h5py
import numpy as np
from sklearn.utils import shuffle
np.random.seed(2019)

X_train = []
X_test = []

for filename in ["gap_ResNet50.h5", "gap_Xception.h5", "gap_InceptionV3.h5", "gap_InceptionResNetV2.h5"]:
    with h5py.File(filename, 'r') as h:
        X_train.append(np.array(h['train']))
        X_test.append(np.array(h['test']))
        y_train = np.array(h['label'])

X_train = np.concatenate(X_train, axis=1)
X_test = np.concatenate(X_test, axis=1)
X_train, y_train = shuffle(X_train, y_train)

构建模型


载入预处理的数据之后,先进行一次概率为0.5的dropout, 减少参数减少计算量,防止过拟合,然后直接连接输出层,激活函数为Sigmoid,优化器为Adadelta,输出一个零维张量,表示某张图片中有狗的概率。

In [41]:
from keras.models import *
from keras.layers import *
from keras.regularizers import *

input_tensor = Input(X_train.shape[1:])
x = input_tensor
x = Dropout(0.5)(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(input_tensor, x)

model.compile(optimizer='adadelta',
              loss='binary_crossentropy',
              metrics=['accuracy'])

整个迁移学习的神经网格结构如下所示。

训练模型


模型构件好了以后,我们就可以进行训练了,这里我们设置验证集大小为 20% 。

其中批处理参数batch_size,太小时可能会使学习过于随机,虽然训练速率很快,但收敛震动且得到一个不可靠模型。在合理的范围内,增大batch_size可以提高内存利用率,大矩阵乘法的并行化效率提高;跑完一次epoch(全数据集)所需的迭代次数减少,对于相同数据量的处理速度进一步加快;在适当范围内,batch_size越大,其确定的下降方向越准,引起训练波动越小,这里batch_size=128是一个比较合适的经验值(在云平台上训练)。

In [43]:
from keras.callbacks import *
# model_history = model.fit(X_train, y_train, batch_size=128, nb_epoch=8, verbose=1, validation_split=0.2, callbacks = [TensorBoard(log_dir='./Graph')])

from keras.callbacks import ModelCheckpoint  
checkpointer = ModelCheckpoint(filepath='weights.best.cat_vs_dog.hdf5', verbose=1, save_best_only=True)
model_history = model.fit(X_train, y_train, epochs=8, batch_size=128, validation_split=0.2, callbacks=[checkpointer], verbose=1)
Train on 19914 samples, validate on 4979 samples
Epoch 1/8
19712/19914 [============================>.] - ETA: 0s - loss: 0.1051 - acc: 0.9698Epoch 00001: val_loss improved from inf to 0.02436, saving model to weights.best.cat_vs_dog.hdf5
19914/19914 [==============================] - 8s 384us/step - loss: 0.1044 - acc: 0.9701 - val_loss: 0.0244 - val_acc: 0.9952
Epoch 2/8
19200/19914 [===========================>..] - ETA: 0s - loss: 0.0205 - acc: 0.9950Epoch 00002: val_loss improved from 0.02436 to 0.01567, saving model to weights.best.cat_vs_dog.hdf5
19914/19914 [==============================] - 1s 48us/step - loss: 0.0209 - acc: 0.9949 - val_loss: 0.0157 - val_acc: 0.9952
Epoch 3/8
19328/19914 [============================>.] - ETA: 0s - loss: 0.0146 - acc: 0.9961Epoch 00003: val_loss improved from 0.01567 to 0.01464, saving model to weights.best.cat_vs_dog.hdf5
19914/19914 [==============================] - 1s 47us/step - loss: 0.0145 - acc: 0.9961 - val_loss: 0.0146 - val_acc: 0.9958
Epoch 4/8
19456/19914 [============================>.] - ETA: 0s - loss: 0.0128 - acc: 0.9961Epoch 00004: val_loss improved from 0.01464 to 0.01459, saving model to weights.best.cat_vs_dog.hdf5
19914/19914 [==============================] - 1s 48us/step - loss: 0.0127 - acc: 0.9961 - val_loss: 0.0146 - val_acc: 0.9954
Epoch 5/8
19328/19914 [============================>.] - ETA: 0s - loss: 0.0111 - acc: 0.9968Epoch 00005: val_loss improved from 0.01459 to 0.01395, saving model to weights.best.cat_vs_dog.hdf5
19914/19914 [==============================] - 1s 48us/step - loss: 0.0110 - acc: 0.9968 - val_loss: 0.0140 - val_acc: 0.9958
Epoch 6/8
19456/19914 [============================>.] - ETA: 0s - loss: 0.0109 - acc: 0.9969Epoch 00006: val_loss did not improve
19914/19914 [==============================] - 1s 47us/step - loss: 0.0107 - acc: 0.9969 - val_loss: 0.0146 - val_acc: 0.9958
Epoch 7/8
19456/19914 [============================>.] - ETA: 0s - loss: 0.0098 - acc: 0.9972Epoch 00007: val_loss did not improve
19914/19914 [==============================] - 1s 46us/step - loss: 0.0097 - acc: 0.9972 - val_loss: 0.0159 - val_acc: 0.9958
Epoch 8/8
19584/19914 [============================>.] - ETA: 0s - loss: 0.0098 - acc: 0.9972Epoch 00008: val_loss did not improve
19914/19914 [==============================] - 1s 46us/step - loss: 0.0096 - acc: 0.9972 - val_loss: 0.0145 - val_acc: 0.9960

训练过中的loss和accuracy如下:

In [45]:
import matplotlib.pyplot as plt
%matplotlib inline
# 画图
def plot_training(history):
    acc = history.history['acc']
    val_acc = history.history['val_acc']
    epochs = range(len(acc))
    plt.plot(epochs, acc, 'b')
    plt.plot(epochs, val_acc, 'r')
    plt.legend(["acc", "val_acc"], loc='best')
    plt.title('Training and validation accuracy')
    plt.figure()

    loss = history.history['loss']
    val_loss = history.history['val_loss']   
    plt.plot(epochs, loss, 'b')
    plt.plot(epochs, val_loss, 'r')
    plt.legend(["loss", "val_loss"], loc='best')
    plt.title('Training and validation loss')
    plt.show()

# 训练的acc_loss图
plot_training(model_history)

可以看到,训练的过程很快,十秒以内就能训练完,准确率也很高,在验证集上最高达到了99.60%的准确率,这相当于一千张图只错了4张, 非常不错的成绩了。

预测测试集


使用训练好的模型中对处理过的测试集数据进行分类,得到每张图片中有狗的概率。在提交的kaggle之前,先使用numpy.clip()函数做一个截断处理,将所有图片的概率值限制在$[0.005, 0.995]$之间,这样可以稍微降低loss。

kaggle 官方的评估标准是 LogLoss,下面的表达式就是二分类问题的 LogLoss 定义。

$$\textrm{LogLoss} = - \frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)\right]$$

$|log(1)|=0.000$

$|log(0.995)|=0.0050$

$lim_{x\to 0}|log(x)|=\infty$

$|log(0.005)|=5.2983$

不过由于kaggle本身就做了一次截断,限制了最小值为$1e-15$,又$|log(1e−15)|=34.5388$。 所以如果把预测正确的概率从1截断到$0.995$,单张图片的loss最多增加$0.0050$,但是如果把预测错误的概率从$1e-15$截断到$0.005$,那么会极大地降低loss。

测试集图片id的顺序为:

In [46]:
%ls test | head -n 10
10000.jpg
10001.jpg
10002.jpg
10003.jpg
10004.jpg
10005.jpg
10006.jpg
10007.jpg
10008.jpg
10009.jpg
ls: write error

由于kaggle是根据id来确定每张图片的类别的,因此我们需要对每个文件名进行处理,然后赋值到 df 里,最后导出为 csv 文件。

In [47]:
import pandas as pd
from keras.preprocessing.image import *

df = pd.read_csv("sample_submission.csv")

image_size = (224, 224)
gen = ImageDataGenerator()
test_generator = gen.flow_from_directory("img_test", image_size, shuffle=False, 
                                         batch_size=16, class_mode=None)

y_pred = model.predict(X_test, verbose=1)
y_pred = y_pred.clip(min=0.005, max=0.995)

for i, fname in enumerate(test_generator.filenames):
    index = int(fname[fname.rfind('/')+1:fname.rfind('.')])
    df.set_value(index-1, 'label', y_pred[i])

df.to_csv('submission.csv', index=None)
df.head(10)
Found 12500 images belonging to 1 classes.
12500/12500 [==============================] - 3s 206us/step
/usr/local/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:16: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  app.launch_new_instance()
Out[47]:
id label
0 1 0.995
1 2 0.995
2 3 0.995
3 4 0.995
4 5 0.005
5 6 0.005
6 7 0.005
7 8 0.005
8 9 0.005
9 10 0.005

将测试集的处理结果提交到kaggle上,loss为0.03556,和验证集的loss近似。

IV. App应用实现部分:导出分类模型和CAM可视化模型


这里使用RetNet50训练的模型预测训练集可视化模型,

加载数据集

  • 这里的图片224x224裁剪,不是从图片中心向四周裁剪,如果图片超出规定尺寸裁剪,最后可能只剩下中间区域一部分,这可能是狗的躯干,头或者尾巴没了,这样训练肯定会影响模型的学习结果;而是对图片进行缩放,这样就不会丢掉猫狗重要部分的了。
  • 这里要注意一下,做普通的猫狗识别y是一维的,导出分类模型y变成两个维度,因为需要把猫编码成【1,0】,狗编码成【0,1】,才能针对分类进行CAM可视化。
  • 如果是一个神经元sigmoid做二分类,模型只会把狗对应的权值加大,对模型来说,猫和其他背景没有差别,因此需要弄两个维度,然后用sofmax激活函数,这样模型不仅仅能区分狗和非狗了,因为如果把猫也看作背景,sigmoid之后没办法让猫的输出比狗大。
In [229]:
import cv2
import numpy as np
from tqdm import tqdm

    
img_size = 224
def load_train(img_size):
    
    train = os.listdir('train')
    n = len(train)
    X = np.zeros((n, img_size, img_size, 3), dtype=np.uint8)
    y = np.zeros((n, 2), dtype=np.uint8) #one-hot编码 0为(1,0),1为(0,1)
    #y = np.zeros((n, 1), dtype=np.uint8) #one-hot编码 0为cat,1为dog
#     for i, filename in zip(tqdm(range(n)), train):
    for i, filename in zip(range(n), train):
        name = filename[0: 3]
        if(name == 'cat'):
            y[i] = [1, 0] 
        elif(name == 'dog'):
            y[i] = [0, 1] 
        else:
            return
        X[i] = cv2.resize(cv2.imread('train/%s' % filename), (img_size, img_size))
        
    return X,y
X_train, y_train = load_train(img_size)
print("训练样本加载完成,训练集图片数量:{}".format(len(X_train)))
训练样本加载完成,训练集图片数量:24893

提取特征

In [230]:
from keras.layers import *
from keras.models import *
from keras.applications import *
from keras.optimizers import *
from keras.regularizers import *
from keras.applications.resnet50 import preprocess_input


# 数据预处理方法是减去训练集每个像素点颜色的平均值
def preprocess_input(x):
    return x - [103.939, 116.779, 123.68]

def get_features(MODEL, data=X_train):
    
    cnn_model = MODEL(include_top=False, input_shape=(224, 224, 3), weights='imagenet')  
    inputs = Input((224, 224, 3))
    x = inputs
    x = Lambda(preprocess_input, name = 'preprocessing')(x) #preprocess_input函数因预训练模型而异
    x = cnn_model(x)
    x = GlobalAveragePooling2D()(x)
    cnn_model = Model(inputs, x)
    
    features = cnn_model.predict(data, batch_size = 64, verbose=1)
    return features
    
features = get_features(ResNet50)
24893/24893 [==============================] - 1609s 65ms/step
In [234]:
inputs = Input(features.shape[1:])
x = inputs
x = Dropout(0.5)(x)
x = Dense(2, activation = 'softmax', kernel_regularizer = l2(1e-4), bias_regularizer = l2(1e-4))(x)
model = Model(inputs, x, name = 'prediction')
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
h = model.fit(features, y_train, batch_size = 128, epochs = 8, validation_split = 0.2)
# 模型概括
model.summary()
Train on 19914 samples, validate on 4979 samples
Epoch 1/8
19914/19914 [==============================] - 57s 3ms/step - loss: 0.1131 - acc: 0.9544 - val_loss: 0.0276 - val_acc: 0.9894
Epoch 2/8
19914/19914 [==============================] - 1s 47us/step - loss: 0.0474 - acc: 0.9842 - val_loss: 0.0246 - val_acc: 0.9912
Epoch 3/8
19914/19914 [==============================] - 1s 46us/step - loss: 0.0412 - acc: 0.9868 - val_loss: 0.0254 - val_acc: 0.9916
Epoch 4/8
19914/19914 [==============================] - 1s 48us/step - loss: 0.0372 - acc: 0.9875 - val_loss: 0.0266 - val_acc: 0.9906
Epoch 5/8
19914/19914 [==============================] - 1s 47us/step - loss: 0.0334 - acc: 0.9883 - val_loss: 0.0241 - val_acc: 0.9920
Epoch 6/8
19914/19914 [==============================] - 1s 47us/step - loss: 0.0322 - acc: 0.9881 - val_loss: 0.0248 - val_acc: 0.9918
Epoch 7/8
19914/19914 [==============================] - 1s 46us/step - loss: 0.0313 - acc: 0.9898 - val_loss: 0.0293 - val_acc: 0.9875
Epoch 8/8
19914/19914 [==============================] - 1s 46us/step - loss: 0.0356 - acc: 0.9878 - val_loss: 0.0263 - val_acc: 0.9910
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_46 (InputLayer)        (None, 2048)              0         
_________________________________________________________________
dropout_18 (Dropout)         (None, 2048)              0         
_________________________________________________________________
dense_18 (Dense)             (None, 2)                 4098      
=================================================================
Total params: 4,098
Trainable params: 4,098
Non-trainable params: 0
_________________________________________________________________

搭建分类器模型和CAM模型


ResNet50最后一层带有一个(7,7)的池化层,但是需要输出卷积层的原始数据,而不是把每个激活图压缩成一个点,因此取倒数第二层搭建成一个新的cnn_model, 再去搭建CAM模型。

在对卷积层输出的激活图进行加权平均的时候,可理解为卷积核大小为1x1的不带bias的卷积层。 分类器就简单了,直接用GlobalAveragePooling2D进行平均,然后用刚才训练的model算一下。

全局平均池化层(Global AveragePooling, GAP)通常用在卷积神经网络的最后一层,用于降维,使用GAP层可极大的降低特征维度,是全连接层参数不至于过多,保留了特征的强度信息,丢弃了特征的位置信息,因为是分类问题,对于位置信息不敏感,所以使用GAP层来降维效果很好。

我们可以对卷积层的输出加权平均,权重是GAP层到这个分类的权重,如下图所示:

这样就能得到一个CAM(Class Activation Mapping)可视化图,也就是类激活图。简单来说,可以在最后一层的卷积层后面加一层。

参考链接:http://cnnlocalization.csail.mit.edu/

$cam = (P-0.5)*w*output$

  • cam: 类激活图 7*7
  • P: 猫狗概率
  • output: 卷积层的输出 2048*1
  • w: 卷积核的权重 7*7*2048
In [235]:
#获取刚才训练模型的全连接权值
weights = model.get_weights()[0]

cnn_model = ResNet50(include_top = False, input_shape=(width, width, 3), weights = 'imagenet')
cnn_model = Model(cnn_model.input, cnn_model.layers[-2].output, name = 'resnet50')

inputs = Input((width, width, 3))
x = inputs
x = cnn_model(x)
cam = Conv2D(2, 1, use_bias = False, name = 'cam')(x)
model_cam = Model(inputs, cam)

x = GlobalAveragePooling2D(name = 'gap')(x)
x = model(x)
model_clf = Model(inputs, x)

载入权值的时候需要把wieght从(2048,2)reshape成(1,1,2048,2),然后载入到model_cam的最后一个1x1卷积层里。

In [236]:
model_cam.layers[-1].set_weights([weights.reshape((1, 1, 2048, 2))])

保存模型

In [ ]:
model_clf.save('model_clf.h5')
model_cam.save('model_cam.h5')

可视化模型

In [14]:
from keras.models import *
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

model_clf = load_model('model_clf.h5')
SVG(model_to_dot(model_clf, show_shapes=True).create(prog='dot', format='svg'))
/Users/heweihua/anaconda3/lib/python3.7/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
Out[14]:
G 61232096648 input_5: InputLayer input: output: (None, 224, 224, 3) (None, 224, 224, 3) 61213835392 resnet50: Model input: output: (None, 224, 224, 3) (None, 7, 7, 2048) 61232096648->61213835392 61211562056 gap: GlobalAveragePooling2D input: output: (None, 7, 7, 2048) (None, 2048) 61213835392->61211562056 61209976784 prediction: Model input: output: (None, 2048) (None, 2) 61211562056->61209976784
In [15]:
model_cam = load_model('model_cam.h5')
SVG(model_to_dot(model_cam, show_shapes=True).create(prog='dot', format='svg'))
/Users/heweihua/anaconda3/lib/python3.7/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
Out[15]:
G 59233675528 input_5: InputLayer input: output: (None, 224, 224, 3) (None, 224, 224, 3) 61196955720 resnet50: Model input: output: (None, 224, 224, 3) (None, 7, 7, 2048) 59233675528->61196955720 61378986392 cam: Conv2D input: output: (None, 7, 7, 2048) (None, 7, 7, 2) 61196955720->61378986392

可视化测试


利用刚才搭建的两个模型尝试可视化,首先用model_clf预测这张图片是猫还是狗。 它会输出两个概率:第一个是猫的,第二个是狗的,我们取猫的概率,也就是prediction[0,0]. 然后用model_cam输出两张CAM可视化的图,模型输出的shape是(1,7,7,2),可以简单的用cam[0, :, :, 1 if prediction < 0.5 else 0]来提取对应类别的CAM图。 之后我们进行一些调整,首先将图片缩小1/10, 因为CAM图的取值范围大概在-5~30,平均值大约是6,所以经过几次调整,除以10可视化效果比较好的数值,然后将数值限制在0~1之间。并转换为Uint8(因为接下来的染色需要Uint8的格式)。 对CAM的染色使用Opencv的函数以及颜色条(见下图),我们选择了COLORMAP_JET样式。

参考链接:http://docs.opencv.org/trunk/d3/d50/group__imgproc__colormap.html

最后将染色的heatmap加载原图上,行形成最终的可视化效果图。

In [246]:
import matplotlib.pyplot as plt
import random
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# 用模型进行预测
plt.figure(figsize=(12, 14))
for i in range(16):
    plt.subplot(4, 4, i+1)
    img = cv2.imread('test/%d.jpg' % random.randint(1, 12500))
    img = cv2.resize(img, (224, 224))
    x = img.copy()
    x.astype(np.uint8)

    prediction = model_clf.predict(np.expand_dims(img, 0))
#     print (prediction) #猫编码成【1,0】,狗编码成【0,1】 [9.408827e-10 1.000000e+00] = [0, 1] --> dog
    prediction = prediction[0, 0]
    if prediction < 0.5:
        plt.title('dog %.2f%%' % (100 - prediction*100))
    else:
        plt.title('cat %.2f%%' % (prediction*100))
    cam = model_cam.predict(np.expand_dims(img, 0))
    cam = cam[0, :, :, 1 if prediction < 0.5 else 0]

    # 调整 CAM 的范围
    cam -= cam.min()
    cam /= cam.max()
    cam -= 0.2
    cam /= 0.8
    cam = cv2.resize(cam, (224, 224))

    # 染成彩色
    heatmap = cv2.applyColorMap(np.uint8(255*cam), cv2.COLORMAP_JET)
    heatmap[np.where(cam <= 0.2)] = 0

    # 加在原图上
    out = cv2.addWeighted(img, 0.8, heatmap, 0.4, 0)

    # 显示图片
    plt.axis('off')
    plt.imshow(out[:,:,::-1])

导出mlmodel模型文件 在iOS11中,可以直接使用Keras模型,只需要使用苹果的模型转换库Core ML Tools将Keras以HDF5格式存储的模型转换为苹果使用的mlmodel格式即可。 参考链接:https://developer.apple.com/documentation/coreml/converting_trained_models_to_core_ml

In [11]:
from coremltools.converters.keras import convert
import coremltools

coreml_model = convert('model_clf.h5', blue_bias=103.939, green_bias=116.779, red_bias=123.68, 
                       input_names=['image'], image_input_names='image', output_names='prediction')
coreml_model.author = 'weihua.he'
coreml_model.short_description = 'Dogs vs Cats'
coreml_model.license = 'MIT' #指定许可证
coreml_model.input_description['image'] = 'A 224x224 Image.'
coreml_model.output_description['prediction'] = 'The probability of Dog and Cat.'
coreml_model.save('model_clf.mlmodel')

#CAM模型
coreml_model = convert('model_cam.h5', blue_bias=103.939, green_bias=116.779, red_bias=123.68, 
                       input_names=['image'], image_input_names='image', output_names='cam')
coreml_model.author = 'weihua.he'
coreml_model.short_description = 'Dogs vs Cats'
coreml_model.license = 'MIT'
coreml_model.input_description['image'] = 'A 224x224 Image.'
coreml_model.output_description['cam'] = 'The cam Image.'
coreml_model.save('model_cam.mlmodel')
0 : input_5, <keras.engine.input_layer.InputLayer object at 0xdd35887b8>
1 : resnet50_conv1, <keras.layers.convolutional.Conv2D object at 0xdd35880b8>
2 : resnet50_bn_conv1, <keras.layers.normalization.BatchNormalization object at 0xdd359e470>
3 : resnet50_activation_50, <keras.layers.core.Activation object at 0xdd359ecf8>
4 : resnet50_max_pooling2d_2, <keras.layers.pooling.MaxPooling2D object at 0xdd359e080>
5 : resnet50_res2a_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd35880f0>
6 : resnet50_bn2a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd35b5160>
7 : resnet50_activation_51, <keras.layers.core.Activation object at 0xdd35b5eb8>
8 : resnet50_res2a_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd35b50f0>
9 : resnet50_bn2a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd35cb5f8>
10 : resnet50_activation_52, <keras.layers.core.Activation object at 0xdd35cb160>
11 : resnet50_res2a_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd35cbb38>
12 : resnet50_res2a_branch1, <keras.layers.convolutional.Conv2D object at 0xdd35cb080>
13 : resnet50_bn2a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd35e37b8>
14 : resnet50_bn2a_branch1, <keras.layers.normalization.BatchNormalization object at 0xdd35e3278>
15 : resnet50_add_17, <keras.layers.merge.Add object at 0xdd35e36a0>
16 : resnet50_activation_53, <keras.layers.core.Activation object at 0xdd35e3080>
17 : resnet50_res2b_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd35e3320>
18 : resnet50_bn2b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd35f72b0>
19 : resnet50_activation_54, <keras.layers.core.Activation object at 0xdd35f7cf8>
20 : resnet50_res2b_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd35f70b8>
21 : resnet50_bn2b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd359ebe0>
22 : resnet50_activation_55, <keras.layers.core.Activation object at 0xdd360f2b0>
23 : resnet50_res2b_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd360f978>
24 : resnet50_bn2b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd360f0f0>
25 : resnet50_add_18, <keras.layers.merge.Add object at 0xdd35f7be0>
26 : resnet50_activation_56, <keras.layers.core.Activation object at 0xdd3628470>
27 : resnet50_res2c_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd36285f8>
28 : resnet50_bn2c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd3628278>
29 : resnet50_activation_57, <keras.layers.core.Activation object at 0xdd36286a0>
30 : resnet50_res2c_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd3628080>
31 : resnet50_bn2c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd36407b8>
32 : resnet50_activation_58, <keras.layers.core.Activation object at 0xdd3640278>
33 : resnet50_res2c_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd3640cf8>
34 : resnet50_bn2c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3640da0>
35 : resnet50_add_19, <keras.layers.merge.Add object at 0xdd360fbe0>
36 : resnet50_activation_59, <keras.layers.core.Activation object at 0xdd36542b0>
37 : resnet50_res3a_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3654978>
38 : resnet50_bn3a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd36540f0>
39 : resnet50_activation_60, <keras.layers.core.Activation object at 0xdd3640be0>
40 : resnet50_res3a_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd366c470>
41 : resnet50_bn3a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd366cb38>
42 : resnet50_activation_61, <keras.layers.core.Activation object at 0xdd366c0f0>
43 : resnet50_res3a_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd366c6a0>
44 : resnet50_res3a_branch1, <keras.layers.convolutional.Conv2D object at 0xdd3683400>
45 : resnet50_bn3a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3683cf8>
46 : resnet50_bn3a_branch1, <keras.layers.normalization.BatchNormalization object at 0xdd3683080>
47 : resnet50_add_20, <keras.layers.merge.Add object at 0xdd3654be0>
48 : resnet50_activation_62, <keras.layers.core.Activation object at 0xdd3697400>
49 : resnet50_res3b_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd36977b8>
50 : resnet50_bn3b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd36970b8>
51 : resnet50_activation_63, <keras.layers.core.Activation object at 0xdd3697320>
52 : resnet50_res3b_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd3697da0>
53 : resnet50_bn3b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd36b1978>
54 : resnet50_activation_64, <keras.layers.core.Activation object at 0xdd36b10b8>
55 : resnet50_res3b_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd36b1eb8>
56 : resnet50_bn3b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd36c7470>
57 : resnet50_add_21, <keras.layers.merge.Add object at 0xdd36c7978>
58 : resnet50_activation_65, <keras.layers.core.Activation object at 0xdd36c7160>
59 : resnet50_res3c_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd36c7b38>
60 : resnet50_bn3c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd36c7080>
61 : resnet50_activation_66, <keras.layers.core.Activation object at 0xdd3683be0>
62 : resnet50_res3c_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd36de400>
63 : resnet50_bn3c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd36decf8>
64 : resnet50_activation_67, <keras.layers.core.Activation object at 0xdd36de080>
65 : resnet50_res3c_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd36c7be0>
66 : resnet50_bn3c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3515da0>
67 : resnet50_add_22, <keras.layers.merge.Add object at 0xdd36de320>
68 : resnet50_activation_68, <keras.layers.core.Activation object at 0xdd34ffe80>
69 : resnet50_res3d_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd34ffcc0>
70 : resnet50_bn3d_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd34e80b8>
71 : resnet50_activation_69, <keras.layers.core.Activation object at 0xdd3515978>
72 : resnet50_res3d_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd34d00b8>
73 : resnet50_bn3d_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd34bbe80>
74 : resnet50_activation_70, <keras.layers.core.Activation object at 0xdd34e8978>
75 : resnet50_res3d_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd34a5e80>
76 : resnet50_bn3d_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3489278>
77 : resnet50_add_23, <keras.layers.merge.Add object at 0xdd34bb940>
78 : resnet50_activation_71, <keras.layers.core.Activation object at 0xdd3475240>
79 : resnet50_res4a_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3489940>
80 : resnet50_bn4a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd3449da0>
81 : resnet50_activation_72, <keras.layers.core.Activation object at 0xdd3475940>
82 : resnet50_res4a_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd3431da0>
83 : resnet50_bn4a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd3419be0>
84 : resnet50_activation_73, <keras.layers.core.Activation object at 0xdd3449940>
85 : resnet50_res4a_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd3404be0>
86 : resnet50_res4a_branch1, <keras.layers.convolutional.Conv2D object at 0xdd33eb0b8>
87 : resnet50_bn4a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd33d6e80>
88 : resnet50_bn4a_branch1, <keras.layers.normalization.BatchNormalization object at 0xdd3419978>
89 : resnet50_add_24, <keras.layers.merge.Add object at 0xdca878908>
90 : resnet50_activation_74, <keras.layers.core.Activation object at 0xdd33a9080>
91 : resnet50_res4b_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd33a9eb8>
92 : resnet50_bn4b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd337ada0>
93 : resnet50_activation_75, <keras.layers.core.Activation object at 0xdd33bf940>
94 : resnet50_res4b_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd33666a0>
95 : resnet50_bn4b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd334d240>
96 : resnet50_activation_76, <keras.layers.core.Activation object at 0xdd337a940>
97 : resnet50_res4b_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd334d5c0>
98 : resnet50_bn4b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3308080>
99 : resnet50_add_25, <keras.layers.merge.Add object at 0xdd3337978>
100 : resnet50_activation_77, <keras.layers.core.Activation object at 0xdd32f1278>
101 : resnet50_res4c_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3308940>
102 : resnet50_bn4c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd32c3080>
103 : resnet50_activation_78, <keras.layers.core.Activation object at 0xdd32f1940>
104 : resnet50_res4c_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd32aeb00>
105 : resnet50_bn4c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd327fda0>
106 : resnet50_activation_79, <keras.layers.core.Activation object at 0xdd32c32b0>
107 : resnet50_res4c_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd326a080>
108 : resnet50_bn4c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd327f978>
109 : resnet50_add_26, <keras.layers.merge.Add object at 0xdd32512b0>
110 : resnet50_activation_80, <keras.layers.core.Activation object at 0xdd3227da0>
111 : resnet50_res4d_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3227be0>
112 : resnet50_bn4d_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd320d278>
113 : resnet50_activation_81, <keras.layers.core.Activation object at 0xdd323ae80>
114 : resnet50_res4d_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd320d940>
115 : resnet50_bn4d_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd31ca080>
116 : resnet50_activation_82, <keras.layers.core.Activation object at 0xdd31f65c0>
117 : resnet50_res4d_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd31b20b8>
118 : resnet50_bn4d_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3187dd8>
119 : resnet50_add_27, <keras.layers.merge.Add object at 0xdd31ca5c0>
120 : resnet50_activation_83, <keras.layers.core.Activation object at 0xdd31720f0>
121 : resnet50_res4e_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3172eb8>
122 : resnet50_bn4e_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd3148dd8>
123 : resnet50_activation_84, <keras.layers.core.Activation object at 0xdd31875f8>
124 : resnet50_res4e_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd31306d8>
125 : resnet50_bn4e_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd311b2b0>
126 : resnet50_activation_85, <keras.layers.core.Activation object at 0xdd3148978>
127 : resnet50_res4e_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd30f1dd8>
128 : resnet50_bn4e_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd30db0f0>
129 : resnet50_add_28, <keras.layers.merge.Add object at 0xdd31042e8>
130 : resnet50_activation_86, <keras.layers.core.Activation object at 0xdd30c7278>
131 : resnet50_res4f_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd30db2e8>
132 : resnet50_bn4f_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd309a0f0>
133 : resnet50_activation_87, <keras.layers.core.Activation object at 0xdd30c79b0>
134 : resnet50_res4f_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd3087b38>
135 : resnet50_bn4f_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd305bdd8>
136 : resnet50_activation_88, <keras.layers.core.Activation object at 0xdd309a2e8>
137 : resnet50_res4f_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd30440b8>
138 : resnet50_bn4f_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd30302b0>
139 : resnet50_add_29, <keras.layers.merge.Add object at 0xdd305b978>
140 : resnet50_activation_89, <keras.layers.core.Activation object at 0xdd3004dd8>
141 : resnet50_res5a_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3004c18>
142 : resnet50_bn5a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd301bef0>
143 : resnet50_activation_90, <keras.layers.core.Activation object at 0xdd2ff07b8>
144 : resnet50_res5a_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd35436a0>
145 : resnet50_bn5a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd31fd240>
146 : resnet50_activation_91, <keras.layers.core.Activation object at 0xdd31fd358>
147 : resnet50_res5a_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd31fd390>
148 : resnet50_res5a_branch1, <keras.layers.convolutional.Conv2D object at 0xdd31fd518>
149 : resnet50_bn5a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd31fd6a0>
150 : resnet50_bn5a_branch1, <keras.layers.normalization.BatchNormalization object at 0xdd31fd7b8>
151 : resnet50_add_30, <keras.layers.merge.Add object at 0xdd31fd8d0>
152 : resnet50_activation_92, <keras.layers.core.Activation object at 0xdd31fd908>
153 : resnet50_res5b_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd31fd940>
154 : resnet50_bn5b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd31fdac8>
155 : resnet50_activation_93, <keras.layers.core.Activation object at 0xdd31fdbe0>
156 : resnet50_res5b_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd31fdc18>
157 : resnet50_bn5b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd31fdda0>
158 : resnet50_activation_94, <keras.layers.core.Activation object at 0xdd31fdeb8>
159 : resnet50_res5b_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd31fdef0>
160 : resnet50_bn5b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd36240b8>
161 : resnet50_add_31, <keras.layers.merge.Add object at 0xdd36241d0>
162 : resnet50_activation_95, <keras.layers.core.Activation object at 0xdd3624208>
163 : resnet50_res5c_branch2a, <keras.layers.convolutional.Conv2D object at 0xdd3624240>
164 : resnet50_bn5c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xdd36243c8>
165 : resnet50_activation_96, <keras.layers.core.Activation object at 0xdd36244e0>
166 : resnet50_res5c_branch2b, <keras.layers.convolutional.Conv2D object at 0xdd3624518>
167 : resnet50_bn5c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xdd36246a0>
168 : resnet50_activation_97, <keras.layers.core.Activation object at 0xdd36247b8>
169 : resnet50_res5c_branch2c, <keras.layers.convolutional.Conv2D object at 0xdd36247f0>
170 : resnet50_bn5c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xdd3624978>
171 : resnet50_add_32, <keras.layers.merge.Add object at 0xdd3624a90>
172 : resnet50_activation_98, <keras.layers.core.Activation object at 0xdd3624ac8>
173 : gap, <keras.layers.pooling.GlobalAveragePooling2D object at 0xdd3624b00>
174 : prediction_dense_1, <keras.layers.core.Dense object at 0xdd14457b8>
175 : prediction_dense_1__activation__, <keras.layers.core.Activation object at 0xdd5bc8160>
0 : input_5, <keras.engine.input_layer.InputLayer object at 0xddc03d400>
1 : resnet50_conv1, <keras.layers.convolutional.Conv2D object at 0xddc052828>
2 : resnet50_bn_conv1, <keras.layers.normalization.BatchNormalization object at 0xddc052630>
3 : resnet50_activation_50, <keras.layers.core.Activation object at 0xddc052cc0>
4 : resnet50_max_pooling2d_2, <keras.layers.pooling.MaxPooling2D object at 0xddc066438>
5 : resnet50_res2a_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc0664e0>
6 : resnet50_bn2a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc07a470>
7 : resnet50_activation_51, <keras.layers.core.Activation object at 0xddc07ac18>
8 : resnet50_res2a_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc07a208>
9 : resnet50_bn2a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddc07aac8>
10 : resnet50_activation_52, <keras.layers.core.Activation object at 0xddc03d048>
11 : resnet50_res2a_branch2c, <keras.layers.convolutional.Conv2D object at 0xddc08fa20>
12 : resnet50_res2a_branch1, <keras.layers.convolutional.Conv2D object at 0xddc08f2e8>
13 : resnet50_bn2a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc0a0828>
14 : resnet50_bn2a_branch1, <keras.layers.normalization.BatchNormalization object at 0xddc0a0208>
15 : resnet50_add_17, <keras.layers.merge.Add object at 0xddc0a0400>
16 : resnet50_activation_53, <keras.layers.core.Activation object at 0xddc0a0cc0>
17 : resnet50_res2b_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc0a0ac8>
18 : resnet50_bn2b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc0b60f0>
19 : resnet50_activation_54, <keras.layers.core.Activation object at 0xddc0b64e0>
20 : resnet50_res2b_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc0b62e8>
21 : resnet50_bn2b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddc0cc828>
22 : resnet50_activation_55, <keras.layers.core.Activation object at 0xddc0cc208>
23 : resnet50_res2b_branch2c, <keras.layers.convolutional.Conv2D object at 0xddc0cce10>
24 : resnet50_bn2b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc07a048>
25 : resnet50_add_18, <keras.layers.merge.Add object at 0xddc0e0a20>
26 : resnet50_activation_56, <keras.layers.core.Activation object at 0xddc0e00f0>
27 : resnet50_res2c_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc0e0c18>
28 : resnet50_bn2c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc0e0cc0>
29 : resnet50_activation_57, <keras.layers.core.Activation object at 0xddc0cc048>
30 : resnet50_res2c_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc0f4438>
31 : resnet50_bn2c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddc0f44e0>
32 : resnet50_activation_58, <keras.layers.core.Activation object at 0xddc0e0048>
33 : resnet50_res2c_branch2c, <keras.layers.convolutional.Conv2D object at 0xddc109630>
34 : resnet50_bn2c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc109208>
35 : resnet50_add_19, <keras.layers.merge.Add object at 0xddc109400>
36 : resnet50_activation_59, <keras.layers.core.Activation object at 0xddc109cc0>
37 : resnet50_res3a_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc109ac8>
38 : resnet50_bn3a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc11f0f0>
39 : resnet50_activation_60, <keras.layers.core.Activation object at 0xddc11f4e0>
40 : resnet50_res3a_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc11f2e8>
41 : resnet50_bn3a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddc134828>
42 : resnet50_activation_61, <keras.layers.core.Activation object at 0xddc134208>
43 : resnet50_res3a_branch2c, <keras.layers.convolutional.Conv2D object at 0xddc134e10>
44 : resnet50_res3a_branch1, <keras.layers.convolutional.Conv2D object at 0xddc0f4048>
45 : resnet50_bn3a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc144c18>
46 : resnet50_bn3a_branch1, <keras.layers.normalization.BatchNormalization object at 0xddc1442e8>
47 : resnet50_add_20, <keras.layers.merge.Add object at 0xddc134048>
48 : resnet50_activation_62, <keras.layers.core.Activation object at 0xddc15c470>
49 : resnet50_res3b_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc15c828>
50 : resnet50_bn3b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc15c240>
51 : resnet50_activation_63, <keras.layers.core.Activation object at 0xddc15cac8>
52 : resnet50_res3b_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc144048>
53 : resnet50_bn3b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddc16fc18>
54 : resnet50_activation_64, <keras.layers.core.Activation object at 0xddc16f2e8>
55 : resnet50_res3b_branch2c, <keras.layers.convolutional.Conv2D object at 0xddc16f400>
56 : resnet50_bn3b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc184438>
57 : resnet50_add_21, <keras.layers.merge.Add object at 0xddc184e10>
58 : resnet50_activation_65, <keras.layers.core.Activation object at 0xddc184240>
59 : resnet50_res3c_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc1844e0>
60 : resnet50_bn3c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc199470>
61 : resnet50_activation_66, <keras.layers.core.Activation object at 0xddc199c18>
62 : resnet50_res3c_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc199208>
63 : resnet50_bn3c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddc199ac8>
64 : resnet50_activation_67, <keras.layers.core.Activation object at 0xddc15c048>
65 : resnet50_res3c_branch2c, <keras.layers.convolutional.Conv2D object at 0xddc1aaa20>
66 : resnet50_bn3c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc1aa2e8>
67 : resnet50_add_22, <keras.layers.merge.Add object at 0xddc199048>
68 : resnet50_activation_68, <keras.layers.core.Activation object at 0xddc1c2470>
69 : resnet50_res3d_branch2a, <keras.layers.convolutional.Conv2D object at 0xddc1c2828>
70 : resnet50_bn3d_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddc1c2240>
71 : resnet50_activation_69, <keras.layers.core.Activation object at 0xddc1aa048>
72 : resnet50_res3d_branch2b, <keras.layers.convolutional.Conv2D object at 0xddc013c18>
73 : resnet50_bn3d_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbfedcc0>
74 : resnet50_activation_70, <keras.layers.core.Activation object at 0xddc1c24e0>
75 : resnet50_res3d_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbdd6048>
76 : resnet50_bn3d_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbdc34e0>
77 : resnet50_add_23, <keras.layers.merge.Add object at 0xddbfed0f0>
78 : resnet50_activation_71, <keras.layers.core.Activation object at 0xddbdafdd8>
79 : resnet50_res4a_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbdc39e8>
80 : resnet50_bn4a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddbd84438>
81 : resnet50_activation_72, <keras.layers.core.Activation object at 0xddbdaf0f0>
82 : resnet50_res4a_branch2b, <keras.layers.convolutional.Conv2D object at 0xddbd70ac8>
83 : resnet50_bn4a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbd5b240>
84 : resnet50_activation_73, <keras.layers.core.Activation object at 0xddbd84a20>
85 : resnet50_res4a_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbd49208>
86 : resnet50_res4a_branch1, <keras.layers.convolutional.Conv2D object at 0xddbd21ac8>
87 : resnet50_bn4a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbd0acc0>
88 : resnet50_bn4a_branch1, <keras.layers.normalization.BatchNormalization object at 0xddbd5b438>
89 : resnet50_add_24, <keras.layers.merge.Add object at 0xddbd0aa20>
90 : resnet50_activation_74, <keras.layers.core.Activation object at 0xddbce2400>
91 : resnet50_res4b_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbce2240>
92 : resnet50_bn4b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddbcf60f0>
93 : resnet50_activation_75, <keras.layers.core.Activation object at 0xddbcce9e8>
94 : resnet50_res4b_branch2b, <keras.layers.convolutional.Conv2D object at 0xddbca6ac8>
95 : resnet50_bn4b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbc92cc0>
96 : resnet50_activation_76, <keras.layers.core.Activation object at 0xddbcb90f0>
97 : resnet50_res4b_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbc7e208>
98 : resnet50_bn4b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbc54438>
99 : resnet50_add_25, <keras.layers.merge.Add object at 0xddbc929e8>
100 : resnet50_activation_77, <keras.layers.core.Activation object at 0xddbc40ac8>
101 : resnet50_res4c_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbc54a20>
102 : resnet50_bn4c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddbc16be0>
103 : resnet50_activation_78, <keras.layers.core.Activation object at 0xddbc400f0>
104 : resnet50_res4c_branch2b, <keras.layers.convolutional.Conv2D object at 0xddbc169e8>
105 : resnet50_bn4c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbbd94e0>
106 : resnet50_activation_79, <keras.layers.core.Activation object at 0xddbc032e8>
107 : resnet50_res4c_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbbd9630>
108 : resnet50_bn4c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbb9c4e0>
109 : resnet50_add_26, <keras.layers.merge.Add object at 0xddbbc4470>
110 : resnet50_activation_80, <keras.layers.core.Activation object at 0xddbb9c400>
111 : resnet50_res4d_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbb72320>
112 : resnet50_bn4d_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddbb87e10>
113 : resnet50_activation_81, <keras.layers.core.Activation object at 0xddbb5e518>
114 : resnet50_res4d_branch2b, <keras.layers.convolutional.Conv2D object at 0xddbb31da0>
115 : resnet50_bn4d_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbb1aeb8>
116 : resnet50_activation_82, <keras.layers.core.Activation object at 0xddbb48978>
117 : resnet50_res4d_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbb1a5c0>
118 : resnet50_bn4d_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbad5be0>
119 : resnet50_add_27, <keras.layers.merge.Add object at 0xddbb022b0>
120 : resnet50_activation_83, <keras.layers.core.Activation object at 0xddbabebe0>
121 : resnet50_res4e_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbabe240>
122 : resnet50_bn4e_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddba91da0>
123 : resnet50_activation_84, <keras.layers.core.Activation object at 0xddbad5978>
124 : resnet50_res4e_branch2b, <keras.layers.convolutional.Conv2D object at 0xddba7a0b8>
125 : resnet50_bn4e_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddba912b0>
126 : resnet50_activation_85, <keras.layers.core.Activation object at 0xddba62940>
127 : resnet50_res4e_branch2c, <keras.layers.convolutional.Conv2D object at 0xddba37da0>
128 : resnet50_bn4e_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddba20eb8>
129 : resnet50_add_28, <keras.layers.merge.Add object at 0xddba4e940>
130 : resnet50_activation_86, <keras.layers.core.Activation object at 0xddba202b0>
131 : resnet50_res4f_branch2a, <keras.layers.convolutional.Conv2D object at 0xddb9f6da0>
132 : resnet50_bn4f_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddb9e0be0>
133 : resnet50_activation_87, <keras.layers.core.Activation object at 0xddba09eb8>
134 : resnet50_res4f_branch2b, <keras.layers.convolutional.Conv2D object at 0xddb9ca240>
135 : resnet50_bn4f_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddb99fda0>
136 : resnet50_activation_88, <keras.layers.core.Activation object at 0xddb9e0978>
137 : resnet50_res4f_branch2c, <keras.layers.convolutional.Conv2D object at 0xddb98a0b8>
138 : resnet50_bn4f_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddb99fe80>
139 : resnet50_add_29, <keras.layers.merge.Add object at 0xddb976940>
140 : resnet50_activation_89, <keras.layers.core.Activation object at 0xddb9486a0>
141 : resnet50_res5a_branch2a, <keras.layers.convolutional.Conv2D object at 0xddb948080>
142 : resnet50_bn5a_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddb95f5c0>
143 : resnet50_activation_90, <keras.layers.core.Activation object at 0xddb935eb8>
144 : resnet50_res5a_branch2b, <keras.layers.convolutional.Conv2D object at 0xddb90ada0>
145 : resnet50_bn5a_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddb8f3080>
146 : resnet50_activation_91, <keras.layers.core.Activation object at 0xddb91f978>
147 : resnet50_res5a_branch2c, <keras.layers.convolutional.Conv2D object at 0xddb8f35f8>
148 : resnet50_res5a_branch1, <keras.layers.convolutional.Conv2D object at 0xddb8b0780>
149 : resnet50_bn5a_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddc027ac8>
150 : resnet50_bn5a_branch1, <keras.layers.normalization.BatchNormalization object at 0xddb8df2b0>
151 : resnet50_add_30, <keras.layers.merge.Add object at 0xddbc37358>
152 : resnet50_activation_92, <keras.layers.core.Activation object at 0xddbc37390>
153 : resnet50_res5b_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbc373c8>
154 : resnet50_bn5b_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddbc37550>
155 : resnet50_activation_93, <keras.layers.core.Activation object at 0xddbc37668>
156 : resnet50_res5b_branch2b, <keras.layers.convolutional.Conv2D object at 0xddbc376a0>
157 : resnet50_bn5b_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbc37828>
158 : resnet50_activation_94, <keras.layers.core.Activation object at 0xddbc37940>
159 : resnet50_res5b_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbc37978>
160 : resnet50_bn5b_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbc37b00>
161 : resnet50_add_31, <keras.layers.merge.Add object at 0xddbc37c18>
162 : resnet50_activation_95, <keras.layers.core.Activation object at 0xddbc37c50>
163 : resnet50_res5c_branch2a, <keras.layers.convolutional.Conv2D object at 0xddbc37c88>
164 : resnet50_bn5c_branch2a, <keras.layers.normalization.BatchNormalization object at 0xddbc37e10>
165 : resnet50_activation_96, <keras.layers.core.Activation object at 0xddbc37f28>
166 : resnet50_res5c_branch2b, <keras.layers.convolutional.Conv2D object at 0xddbc37f60>
167 : resnet50_bn5c_branch2b, <keras.layers.normalization.BatchNormalization object at 0xddbb3d128>
168 : resnet50_activation_97, <keras.layers.core.Activation object at 0xddbb3d240>
169 : resnet50_res5c_branch2c, <keras.layers.convolutional.Conv2D object at 0xddbb3d278>
170 : resnet50_bn5c_branch2c, <keras.layers.normalization.BatchNormalization object at 0xddbb3d400>
171 : resnet50_add_32, <keras.layers.merge.Add object at 0xddbb3d518>
172 : resnet50_activation_98, <keras.layers.core.Activation object at 0xddbb3d550>
173 : cam, <keras.layers.convolutional.Conv2D object at 0xddbb3d588>

将导出的模型,放到xcode新建的工程中使用即可, 摄像头识别使用的是opencv的摄像头。 iOS那部分代码由于篇幅问题且非重点,见github:https://github.com/bjheweihua/cats_vs_dogs

在iphone手机上运行App, 摄像头识别效果如下图:

V. 结果分析


单个ResNet50模型8次迭代训练结果:

  • 训练集loss:0.0356 - acc: 0.9878;验证集val_loss:0.0263 - val_acc: 0.9910。

使用ResNet50,Xception,InceptionV3,InceptionResNetV2这四个模型进行迁移学习8次迭代训练结果:

  • 训练集loss:0.0096 - acc: 0.9972; 验证集val_loss:0.0170 - val_acc: 0.9960。

ResNet50有168层,Xception有126层,InceptionV3有159层,InceptionResNetV2有572层,不同的卷积核有各种各样的的组合,可更好抽取图片中的泛化特征;这样既提高分类的准确率,又降低模型的过拟合风险,一般情况下网络越深,准确率越高。

ResNet50,Xception,InceptionV3,InceptionResNetV2这四个模型进行组合迁移学习,效果比先单个神经网络模型效果好。这里利用了bagging的思想,通过多个模型处理数据并进行组合,可以有效降低模型的方差,减少过拟合程度,提高分类准确率。

VI. 总结感想


本项目使用迁移学习和融合模型完成,站在巨人的肩膀上可以轻松的学习到很优秀的权重参数。这过程中在自己MAC上折腾好几周,但是只能跑一个模型,机器发热严重风扇呼呼的一直转,感觉电脑都要冒烟了,好几次导致电脑自动关机了。然后,跑任务的时候把电脑所有的程序都关掉,跑了足足8多小时,终于能全部跑通。但是,使用多模型融合时就不行了,电脑直接提示内存警告,直接宕机,在网上找了AWS、腾讯的GPU,都不好用,腾讯的还收费那么贵。终于,通过一个同事知道公司内部有免费服务器资源,通过申请公司的GUP服务器才把项目完成。做这个项目,最大的困难就是使用云计算平台去训练,熟练了整个云计算平台的基本使用流程。

该项目中使用了Xception,InceptionV3 和 ResNet50 ,gap_InceptionResNetV2这四个模型进行了提取特征向量,然后将特征向量直接拼接,忽略了特征之间的位置关系。除了这四个模型,还可以增加更多新的模型,或者使用stacking的方法进行模型融合,进一步降低方差,提高分类的准确率。或者使用更强大的分类器进行训练;这过程中尝试过使用EfficientNets,预训练好的模型EfficientNet提供B0到B5, B5的精确度83.2%,B6,B7还没提供,由于时间关系没继续走这条路。

最开始做这个项目,先用TensorFlow来实现,发现有很多坑,mac尝试了很多次没能跑起来,所以放弃了。Keras对于初学者还是很友好,使用简单而且稳定,虽然Keras底层实现也是TensorFlow框架,但是Keras对TensorFlow封装的更好,更友好。除非Keras没提供的一些TensorFlow的方法,这时我们可使用TensorFlow底层方法。

VII. 参考文献


[1] Karen Simonyan and Andrew Zisserman. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE- SCALE IMAGE RECOGNITION. At ICLR,2015. 
[2] [译] Deep Residual Learning for Image Recognition (ResNet)
[3] 手把手教你如何在Kaggle猫狗大战冲到Top2%:https://yangpeiwen.com/dogs-vs-cats-2
[4] Keras做图片分类(四):迁移学习--猫狗大战实战:https://zhuanlan.zhihu.com/p/51889181
[5] Kaggle猫狗大战准确率Top 2%webapp部署:https://www.jianshu.com/p/1bc2abe88388
[6] Keras中文文档:https://keras.io/zh/applications
[7] 毕业设计 Dogs vs Cats For Udacity P7 (异常值检验):https://zhuanlan.zhihu.com/p/34068451
[8] 面向小数据集构建图像分类模型:https://keras-cn-docs.readthedocs.io/zh_CN/latest/blog/image_classification_using_very_little_data
[9] 杨培文 胡博强.深度学习技术图像处理入门. 北京:清华大学出版社,2018(2019.4 重印).
[10] [美]Ian GoodFellow [加]Yoshua Bengio [加]Aaron Courvile 著 赵申剑等人译. 深度学习. 北京:人民邮电出版社,2017.8(2017.12 重印).
[11] efficientnet: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
In [ ]: